Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

نویسندگان

  • Ion Androutsopoulos
  • Georgios Paliouras
  • Vangelis Karkaletsis
  • Georgios Sakkis
  • Constantine D. Spyropoulos
  • Panagiotis Stamatopoulos
چکیده

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memorybased learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A New Approach to Spam Mail Detection

The ever increasing menace of spam is bringing down productivity. More than 70% of the email messages are spam, and it has become a challenge to separate such messages from the legitimate ones. I have developed a spam identification engine which employs naive Bayesian classifier to identify spam. A new concept-based mining model that analyzes terms on the sentence, document is introduced. . The...

متن کامل

 Structure Learning in Bayesian Networks Using Asexual Reproduction Optimization

A new structure learning approach for Bayesian networks (BNs) based on asexual reproduction optimization (ARO) is proposed in this letter. ARO can be essentially considered as an evolutionary based algorithm that mathematically models the budding mechanism of asexual reproduction. In ARO, a parent produces a bud through a reproduction operator; thereafter the parent and its bud compete to survi...

متن کامل

Machine Learning methods for E-mail Classification

The increasing volume of unsolicited bulk e-mail (also known as spam) has generated a need for reliable antispam filters. Using a classifier based on machine learning techniques to automatically filter out spam email has drawn many researchers attention. In this paper we review some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs, Artificial immune system...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.CL/0009009  شماره 

صفحات  -

تاریخ انتشار 2000